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Abstract 

Recent  advances  in  causal  inference  have  given  rise  to  a  general  and  easy-to-use  for¬ 
mula  for  assessing  the  extent  to  which  the  effect  of  one  variable  on  another  is  mediated 
by  a  third.  This  Mediation  Formula  is  applicable  to  nonlinear  models  with  both  dis¬ 
crete  and  continuous  variables,  and  permits  the  evaluation  of  path-specific  effects  with 
minimal  assumptions  regarding  the  data-generating  process.  We  demonstrate  the  use 
of  the  Mediation  Formula  in  simple  examples  and  illustrate  why  parametric  methods  of 
analysis  yield  distorted  results,  even  when  parameters  are  known  precisely.  We  stress 
the  importance  of  distinguishing  between  the  necessary  and  sufficient  interpretations  of 
“mediated-effect”  and  show  how  to  estimate  the  two  components  in  nonlinear  systems 
with  continuous  and  categorical  variables. 
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1  Introduction 

Consider  a  randomized  clinical  trial  in  which  an  intervention  X  shows  a  signihcant  effect 
on  an  outcome  Y.  A  question  that  invariably  comes  to  investigators’  minds  is:  How  and 
why  does  the  intervention  produce  the  effect,  or,  more  specihcally,  can  the  effect  of  X  on  K 
be  attributed  to  its  effect  on  some  intermediate  variable  Z  standing  between  the  two?  The 
reasons  we  are  concerned  with  such  questions  are  both  scientihc  and  practical.  Scientihcally, 
mediation  tells  us  “how  nature  works”  and,  practically,  it  enables  us  to  predict  behavior 
under  a  rich  variety  of  conditions  and  interventions.  For  example,  an  investigator  interested 
in  preventing  Y  may  wish  to  assess  the  extent  to  which  Y  could  be  prevented  by  changing 
an  intermediate  variable,  Z,  standing  between  X  and  Y,  or  modifying  some  intermediate 
process  between  X  and  Z  (MacKinnon,  2008,  Ch.  2;  Bullock,  Green,  &  Ha,  2010). 

For  the  past  few  decades  the  analysis  of  mediation  has  been  dominated  by  linear  regression 
paradigms,  most  notably  the  one  advanced  by  Baron  and  Kenny  (1986),  which  can  be  stated 
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as  follows:  To  test  the  contribution  of  a  given  mediator  Z  to  the  effect  of  X  on  Y,  hrst 
regress  F  on  X  and  estimate  the  regression  coefficient  Ryx,  to  be  equated  with  the  total 
effect.  Second,  include  Z  in  the  regression  and  estimate  the  partial  regression  coefficient 
Ryx  z  when  Z  is  “controlled  for”  (or  “conditioned  on”  or  “adjusted  for”).  The  difference 
between  the  two  slopes,  Ryx  —  Ryx  z,  would  then  measure  the  reduction  in  the  total  effect 
due  to  controlling  for  Z  and  should  quantify  the  effect  mediated  through  Z. 

The  intuition  behind  this  scheme  is  demonstrated  in  Fig.  1(a)  which  shows  a  linear 


Figure  1:  (a)  A  single  mediator  Z  contributing  /?  x  7  to  the  overall  effect,  (b)  Multiple 
mediators,  each  contributing  Pi  x  7*.  The  error  terms,  €1,62,63  represent  factors  omitted 
from  the  analysis. 

structural  equation  model  governing  the  causal  relationships  between  X,  Y,  and  Z.  If  the 
total  effect  of  X  on  F  through  both  pathways  is  r  =  a  +  P'y,  by  adjusting  for  Z,  we  sever 
the  F-mediated  path  and  the  effect  will  be  reduced  to  a.  The  difference  between  the  two 
regression  slopes  gives 

T  —  a  =  P'y  (1) 

and  the  product  P'y  is  what  we  expect  the  2;-mediated  effect  to  be. 

Alternatively,  one  can  venture  to  estimate  P  and  7  independently  of  r.  This  is  done  by 
hrst  estimating  the  regression  slope  of  F  on  X  to  get  P,  then  estimating  the  regression  slope 
of  F  on  F  controlling  for  X,  which  gives  us  7;  multiplying  the  two  slopes  together  gives  us 
the  mediated  effect  P'y.  The  scheme  generalizes  naturally  to  multi-path  models,  as  shown 
in  Fig.  1(b)  which  represents  an  opportunity  to  intervene  on  three  mediating  variables,  or 
any  subset  thereof.  The  difference  between  the  total  effect  r  and  the  effect  measured  after 
adjusting  for  mediator  F*  gives  the  extent  to  which  the  indirect  path  through  F*  contributes 
to  the  overall  effect,  r.  Again,  this  can  be  estimated  either  by  the  difference-in-coefficients 
or  product-of-coefficients  method. 

The  validity  of  these  two  methods  depends  of  course  on  the  assumption  that  the  error 
terms,  61,62,  and  63,  are  uncorrelated  for,  otherwise,  some  of  the  structural  parameters  a,  P 
and  7  would  not  be  estimable  by  regression  analysis  and  both  the  difference-in-coefficients 
and  product-of-coefficients  methods  would  produce  biased  results.  In  randomized  trials, 
where  61  can  be  identihed  with  the  randomization  device,  we  are  assured  that  61  is  un¬ 
correlated  with  62  and  63  and,  so,  the  regressional  estimates  of  r  and  P  will  be  unbiased. 
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However,  randomization  does  not  remove  correlations  between  62  and  63  and,  if  such  exist, 
adjusting  for  Z  will  create  spurious  correlation  between  X  and  Y  which  will  be  added  to  r 
and  would  prevent  the  proper  estimate  of  7  or  a.  In  other  words,  the  regression  coefficient 
Ryz  x  would  no  longer  equal  7  and  the  difference  Ryx  —  RzxRyzx  would  no  longer  equal 
a.  This  follows  from  the  fact  that  “controlling”  or  “adjusting”  for  Z  in  the  analysis  (by 
including  Z  in  the  regression  equation)  does  not  physically  disable  the  paths  going  through 
Z;  it  merely  matches  samples  with  equal  Z  values,  and  thus  induces  spurious  correlations 
among  other  factors  in  the  analysis  (see  Pearl,  1998;  Cole  &  Hernan,  2002;  VanderWeele 
&  Vansteelandt,  2009)  d  Such  correlations  cannot  be  detected  by  statistical  means  and,  so, 
theoretical  knowledge  must  be  invoked  to  identify  the  sources  of  these  correlations  and,  if 
possible,  control  for  common  causes  (so  called  “confounders” )  of  Z  and  Hd  Remarkably, 
the  regressional  estimates  of  the  difference-in-coefficients  and  the  product-of-coefficients  will 
always  be  equal  d 

This  approach  to  mediation  (often  associated  with  Baron  and  Kenny)  has  two  major 
drawbacks.  One  (mentioned  above)  is  its  reliance  on  the  untested  assumption  of  uncorrelated 
errors,  and  the  second  is  its  reliance  on  linearity  and,  in  particular,  on  a  property  of  linear 
systems  called  “effect  constancy”  (or  “no  interaction”):  The  effect  of  one  variable  on  another 
is  independent  on  the  level  at  which  we  hold  a  third.  This  property  does  not  extend  to 
nonlinear  systems;  the  level  at  which  we  control  Z  would  in  general  modify  the  effect  of  X 
on  Y .  For  example,  if  the  output  Y  requires  both  X  and  Z  to  be  present,  then  holding  Z  at 
zero  would  disable  the  effect  of  X  on  K,  while  holding  Z  at  a  high  value  would  enable  the 
latter. 

As  a  consequence,  additions  and  multiplications  are  not  self-evident  in  nonlinear  systems. 
It  may  not  be  appropriate,  for  example,  to  dehne  the  indirect  effect  in  terms  of  the  “differ¬ 
ence”  in  the  total  effect,  with  and  without  control.  Nor  would  it  be  appropriate  to  multiply 
the  effect  of  X  on  Z  by  that  of  Z  on  K  (keeping  X  at  some  level)  -  multiplicative  compo¬ 
sitions  demand  their  justihcations.  Indeed,  all  attempts  to  dehne  mediation  by  generalizing 
the  difference  and  product  strategies  to  nonlinear  system  have  resulted  in  distorted  and  ir¬ 
reconcilable  results  (MacKinnon,  Fairchild,  &  Fritz,  2007;  MacKinnon,  Lockwood,  Brown, 
Wang,  &  Hoffman,  2007;  Glynn,  2009;  Pearl,  2011a). 

This  paper  describes  a  recently  developed  method  that  removes  these  nonlinear  barriers 
and  avails  mediation  analysis  to  a  large  space  of  new  applications,  especially  those  involving 
categorical  data  and  highly  nonlinear  processes.  The  hrst  limitation,  the  requirement  of  error 

^This  can  be  seen  vividly  by  setting  a  =  7  =  0,  implying  zero  direct  and  indirect  effects;  yet,  if  £2  and  £3 
are  correlated,  the  regression  coefficient  Ryx  z  will  not  equal  zero,  but  — /3cou(£2,  £3)/uar(£2). 

^Although  Judd  and  Kenny  (1981)  recognized  the  importance  of  controlling  for  mediator-output  con- 
founders,  the  point  was  not  mentioned  in  the  influential  paper  of  (Baron  &  Kenny,  1986)  and,  as  a  result,  it 
has  been  ignored  by  most  researchers  in  the  social  and  psychological  sciences  (Judd  &  Kenny,  2010). 

^This  follows  from  the  fact  that  the  regressional  image  of  (1),  Ryx  —  Ryx  z  =  RzxRyz  x,  is  a  uni¬ 
versal  identity  among  regression  coefficients  of  any  three  variables,  and  has  nothing  to  do  with  causation  or 
mediation.  It  will  continue  to  hold  regardless  of  whether  confounders  are  present,  whether  the  underlying 
model  is  linear  or  nonlinear,  or  whether  the  arrows  in  the  model  of  Fig.  1(a)  point  in  the  right  direction. 
The  equality  also  holds  among  the  OLS  estimates  of  these  parameters  regardless  of  sample  size  (Hahn  & 
Pearl,  2011).  Note  the  essential  distinction  between  structural  and  regressional  parameters,  often  conflated 
by  some  writers  (Sobel,  2008;  Rubin,  2010);  the  former  convey  causal  relationships,  the  latters  are  purely 
statistical.  Conditions  for  their  equality  can  be  found  in  (Pearl,  2009,  p.  150). 
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independence  (or  “no  unmeasured  confounders,”  as  it  is  often  called)  will  remain  intact,  and 
should  be  kept  in  mind  throughout  our  discussiond  Our  focus  in  the  sequel  however  will  be 
on  crossing  the  linear-to-nonlinear  barrier,  using  the  same  causal  assumptions  that  support 
the  standard  linear  analysis  of  (Baron  &  Kenny,  1986). 

2  Total,  direct  and  indirect  effects 

Consider  the  nonlinear  version  of  the  mediation  model,  as  depicted  in  Fig.  2.  In  the  most 


Figure  2:  A  generic  model  depicting  mediation  through  Z  with  no  confounders. 

general  case,  the  corresponding  structural  equations  would  have  the  form: 

a;  =  Fi(ei)  z  =  F-2{x,e-2)  y  =  F^i^x,  z,  63)  (2) 

where  X,  Y,  Z  are  discrete  or  continuous  random  variables,  Fi,F2,  and  F3  are  arbitrary 
functions,  and  ei,  62,  €3  represent  omitted  factors  which  are  assumed  to  be  mutually  indepen¬ 
dent  yet  arbitrarily  distributed.  Aside  from  this  qualitative  independence  assumption,  the 
model  also  hypothesizes  the  direction  of  causal  influences,  which  are  often  discernible  from 
temporal  information  or  theoretical  knowledge.  Notably,  the  model  allows  for  the  existence 
of  millions  of  unobserved  subprocesses  that  make  up  the  functions  Fi,F2,  and  F3]  these  do 
not  alter  questions  concerning  the  mediating  role  of  Z. 

Since  the  functions  Fi,  F2,  and  F3  are  unknown  to  investigators,  mediation  analysis  com¬ 
mences  by  hrst  dehning  total,  direct  and  indirect  effects  in  terms  of  those  functions  and, 
then,  expressing  them  in  terms  of  the  available  data,  which  we  assume  is  given  in  the  form 
of  random  samples  {x,  y,  z)  drawn  from  the  joint  probability  distribution  P{x,  y,  z). 

2.1  Total  effect 

Among  the  three  types  of  effects  considered  here,  the  easiest  to  dehne  and  estimate  is  the 
total  effect,  TEq^i  which  measures  the  change  in  Y  produced  by  a  unit  change  in  X,  say  from 
X  =  0  to  X  =  1.  The  status  of  Z  need  not  be  specihed  in  this  dehnition,  since  Z  is  allowed 
to  track  the  changes  in  X  and,  so,  we  have  for  the  total  effect: 

TEo^i{e2,  €3)  =  Fs)!,  ^2(1,  62),  es]  —  F3[0,  ^2(0,  62),  63] 

complete  set  of  techniques  is  now  available  for  neutralizing  error  dependencies,  whenever  possible, 
both  by  covariate  adjustment  and  through  the  use  of  instrumental  variables  (Shpitser  &  Pearl,  2008;  Pearl, 
2009;  Xian  &  Shpitser,  2010).  These  techniques  are  directly  applicable  to  the  analysis  of  mediations  (Pearl, 
2009,  p.  128;  Pearl,  2011a,  Shpitser  &  VanderWeele,  2011),  but  are  beyond  the  scope  of  this  paper. 
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where  62  and  €3  are  expected  to  vary  from  individual  to  individual.  At  the  population  level, 
we  will  dehne  the  total  effect  TEq^i  to  be  the  expectation  of  the  difference  above  taken  over 
62  and  63,  which  (assuming  independent  errors)  gives: 

TEo^i  =  E{Y\X  =  1)  -  E{Y\X  =  0)  (3) 

where  E{Y\X  =  1)  is  the  expected  value  of  Y  when  X  equals  1.  This  difference  is  none 
other  but  the  regression  slope  of  Y  on  X,  commonly  estimated  by  OLS,  which  provides  an 
unbiased  estimate  of  the  total  effect  regardless  of  the  functional  form  of  F2  and  T3  (Pearl, 
2009,  p.  72). 

More  generally,  however,  if  we  are  interested  in  the  total  effect  of  a  transition  from  X  =  x 
to  X  =  x',  where  x  and  x'  are  any  two  levels  of  X  (say  two  dosage  levels  of  a  drug),  we 
write: 

TE,,,,  =  E{Y\X  =  x')-E(Y\X  =  x).  (4) 

Clearly,  in  nonlinear  systems,  both  the  baseline  X  =  x  and  the  endpoint  X  =  x'  may  play 
a  role  in  affecting  the  change  of  Y . 

2.2  Controlled  and  natural  direct  effects 

The  idea  of  estimating  the  direct  effect  of  X  on  X  by  controlling  for  Z  is  applicable  to 
nonlinear  models  as  well  since,  assuming  62  and  63  are  independent,  conditioning  on  Z 
simulates  the  physical  action  of  “hxing”  or  “setting”  Z  at  a  constant  value,  ;2,  thus  preventing 
X  from  transmitting  its  change  along  the  mediating  path  X  — Z  — X.  The  resulting 
estimand  is  called  the  “controlled  direct  effect”  (Robins  &  Greenland,  1992;  Pearl,  2001): 

CDE{z)  =  E{Y\X  =  l,Z  =  z)-  E{Y\X  =  0,  Z  =  z)  (5) 

which  is  the  regression  slope  of  X  on  X  keeping  Z  constant  at  z.^ 

However,  the  question  arises:  at  what  value  should  we  set  X?  As  remarked  earlier, 
different  settings  of  Z  would  yield  different  results.  For  example,  assume  that  X  stands  for 
a  drug  taken  to  cure  a  disease  X.  As  a  side  effect,  X  also  stimulates  the  secretion  of  an 
enzyme  Z  that  hastens  the  process  through  which  the  drug  acts  on  the  disease.  If  we  £x 
X  at  a  high  level,  the  drug  will  appear  highly  efficacious,  while  if  we  fix  X  at  a  low  level, 
the  drug  will  have  only  a  meager  effect.  The  question  remains  therefore,  at  what  value  of 
X  should  we  conduct  our  analysis  if  we  wish  to  evaluate  the  direct  effect  of  the  drug  on  the 
disease,  unmediated  by  X? 

^The  general  causal  expression  for  CDE{z),  which  does  not  assume  error-independence  is  given  by: 
CDE{z)  =  E[Y\do{X  =l,Z  =  z)]-  E[Y\do{X  =  0,Z  =  z)] 

(see  Pearl,  2009,  p.  127)  or,  using  the  structural  equations  of  Eq.  (2), 

CDE{z)  =  E[EAl,  z,  63)]  -  E[EA0,  z,  63)] 

A  necessary  and  sufficient  condition  for  estimating  CDE{z)  in  observational  studies  (in  the  presence  of 
unobserved  confounders)  can  be  derived  using  do-calculus  (Pearl,  2009,  pp.  85-88),  and  is  given  in  Shpitser 
and  Pearl  (2008)  and  Tian  and  Shpitser  (2010). 
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One  can  report,  of  course,  the  value  of  CDE{z)  for  each  level  Z  =  z,  and  let  the  user 
choose  the  value  that  matches  the  intervention  policy  under  consideration.  In  many  cases, 
however,  the  policy  informed  by  the  direct  effect  is  not  one  where  Z  is  set  to  a  uniform  level 
for  all  units  in  the  population  but,  rather,  one  where  the  sensitivity  of  Z  to  X  is  suppressed  or 
enhanced,  not  Z  itself.  Taking  the  enzyme  example  above,  a  policy  maker  may  be  interested 
in  the  beneht  of  developing  a  cheaper  drug,  identical  to  the  one  studied,  but  lacking  the 
potential  to  stimulate  enzyme  secretion.  Absent  the  mediating  effect  of  Z,  the  efficacy  of 
the  new  drug  will  be  determined  by  whatever  level  Z  attains  naturally  in  the  population, 
varying  from  individual  to  individual,  not  set  uniformly  by  external  control. 

Under  such  settings,  it  is  more  meaningful  to  dehne  a  notion  of  direct  effect  called  nat¬ 
ural,  that  does  not  require  setting  Z  uniformly  over  the  population,  but  lets  it  vary  from 
individual  to  individual.  This  notion,  denoted  NDEx,x'(X)  is  dehned  as  the  expected  change 
in  Y  induced  by  changing  X  from  x  to  x'  while  keeping  all  mediating  factors  constant  at 
whatever  value  they  would  have  obtained  under  X  =  x,  before  the  transition  from  x  to  x' 
(Robins  &  Greenland,  1992;  Pearl,  2001).®  This  dehnition  of  direct-effect  invokes  the  phrase: 
“at  whatever  value  they  would  have  obtained”  which  is  counterfactual,  and  thus  circumvents 
ideological  prohibitions,  upheld  by  some  statisticians,  against  attributing  mediation  to  non- 
manipulable  variables  (Pearl,  2011b).  At  the  same  time,  because  of  its  counterfactual  char¬ 
acter,  the  natural  direct  effect  cannot  be  given  direct  empirical  test;  there  is  no  way  to  rerun 
history  and  measure  subjects  response  under  conditions  they  have  not  actually  experienced. 
It  has  been  shown  nevertheless  (Pearl,  2001)  that,  for  the  confounding-free  model  of  Fig.  2, 
the  natural  direct  effect  can  be  estimated  from  population  data^  and  is  given  by: 

NDEx,x'{Y)  =  J2[E{Y\X  =  x',Z  =  z)-  E{Y\X  =  x,Z  =  z)]P{Z  =  z\X  =  x) 

Z 

or,  using  a  short-hand  notation,  we  write: 

NDEx,x'iY)  =  Y,[E{YW (6) 

Z 

The  intuition  is  simple,  the  natural  direct  effect  is  the  weighted  average  of  the  controlled 
direct  effect,  using  the  pre-transition  distribution  P{z\x)  as  a  weighting  function.  Equation 
(6)  can  be  estimated  by  a  two-step  regression,  as  will  be  shown  in  the  sequel. 

®Using  the  structural  model  of  Eq.  (2),  the  formal  definition  of  the  natural  direct  effect  reads: 

NDEx,x'{Y)  =  E[FAx',F2{,x,€2),eA]-E[FAx,FAx,€2),tA] 

Robins  and  Greenland  (1992)  called  this  notion  of  direct  effect  “Pure”  while  Pearl  called  it  “Natural,”  to 
stress  the  natural,  unperturbed  distribution  of  values,  Z  =  ^2(0;,  62)  at  which  we  “freeze”  Z  while  changing 
X  from  X  =  X  to  X  =  x' .  For  discussions  regarding  policy  implications  of  NDE  versus  CDF,  see  (Pearl, 
2001;  Robins,  2003;  Joffe  et  al.  2007;  Hafeman  and  Schwartz,  2009;  Pearl,  2009,  p.  132;  Kaufman,  2010; 
Robins  and  Richardson,  2011;  Albert  and  Nelson,  2011). 

^In  the  presence  of  measured  and  unmeasured  confounders,  the  general  conditions  under  which  NDE  is 
estimable  from  population  data  are  somewhat  more  stringent  than  those  needed  for  CDF  (footnote  5).  For 
details  see  Pearl  (2001);  Avin,  Shpitser,  and  Pearl  (2005);  Petersen,  Sinisi,  and  van  der  Laan  (2006);  Robins 
(2003);  VanderWeele  (2009);  Kaufman  (2010);  Robins  and  Richardson  (2011);  Shpitser  and  VanderWeele 
(2011). 
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2.3  Indirect  effects 

Remarkably,  the  counterfactual  definition  of  the  natural  direct  effect  can  be  turned  around 
and  provide  an  operational  definition  for  the  indirect  effect — a  concept  shrouded  in  mystery 
and  controversy,  because  it  is  impossible,  by  controlling  any  of  the  variables  in  the  model,  to 
selectively  disable  the  direct  link  from  X  to  R  so  as  to  let  X  influence  Y  solely  via  indirect 
paths.  Therefore  the  indirect  effect  has  no  “controlled”  interpretation. 

The  indirect  effect,  IE,  of  the  transition  from  x  to  x'  is  defined  as  the  expected  change  in  Y 
affected  by  holding  X  constant,  at  X  =  x,  and  changing  Z  (for  each  individual)  to  whatever 
value  it  would  have  attained  had  X  been  set  to  X  =  x'.  Formally,  this  counterfactual 
definition  reads: 


IE^,^fY)  =  E[F,{x,  F2{x',  62),  63)]  -  E[F:,{x,  F^ix,  62),  63)]  (7) 

which  is  similar  to  the  definition  of  the  natural  direct  effect  (footnote  6)  save  for  exchanging 
X  with  x'  in  the  first  term. 

Assuming  again  the  confounding-free  model  of  Fig.  2,  the  indirect  effect  defined  in  (7) 
can  be  reduced  to  an  estimable  expression  (Pearl,  2001),  given  by  : 

JE,,,,(F)  =  Y,E{Y\x,z)[P{z\x')  -  P{z\x)].  (8) 

Z 

The  intuition  here  is  quite  different  and  unveils  a  nonparametric  version  of  the  product-of- 
coefficients  strategy.  The  term  E{Y\x,  z)  plays  the  role  of  7  in  Fig.  1(a),  for  it  specifies  how 
Y  responds  to  Z  for  any  fixed  x,  and  the  difference  P{z\x')  —  P{z\x)  plays  the  role  of  (3,  for 
it  captures  the  impact  of  the  transition  from  x  to  x'  on  the  probability  of  Z.  We  see  that 
what  was  a  simple  product  operation  in  linear  systems  is  here  replaced  by  a  composition 
operator  that  involves  summation  over  all  values  of  Z. 

Equation  (8)  provides  a  general  formula  for  mediation  effects,  applicable  to  any  nonlin¬ 
ear  system,  any  distribution,  and  any  type  of  variables.  Moreover,  the  formula  is  readily 
estimable  by  regression.  Owing  to  its  generality  and  ubiquity,  I  have  referred  to  this  expres¬ 
sion  as  the  “Mediation  Formula”  (Pearl,  2009,  2010). 

Not  surprising,  owed  to  the  nonlinear  nature  of  the  model,  the  relationship  between  the 
total,  direct  and  indirect  effects  is  non-additive.  The  total  effect  TE  of  a  transition  is  in  fact 
the  difference  (not  the  sum)  between  the  direct  effect  and  the  indirect  effect  of  the  reverse 
transition.  Formally, 

TE^,^,{Y)  =  NDE^,^,{Y)  -  IE,,,,{Y).  (9) 

where  lEx^xiY)  stands  for  the  indirect  effect  of  the  transition  from  X  =  x'  to  X  =  x.  In 
linear  systems,  where  reversal  of  transitions  amounts  to  negating  the  signs  of  their  effects, 
we  have  lE^^x'  =  Plix'  —  x)  =  —lEx'^x  and  the  standard  additive  formula  prevails.  In 
general,  however,  additivity  is  a  rare  occurrence  and  it  is  the  difference  formula  in  Eq.  (9) 
that  governs  the  relation  between  the  total  direct  and  indirect  effects  of  the  transition  from 
X  to  x'. 

In  the  rest  of  the  paper  we  will  drop  the  letter  ‘iV’  from  the  acronym  NDE,  with  the 
understanding  that  DE  stands  for  the  natural  direct  effect  estimand  given  by  Eq.  (6),  and 
use  the  acronyms  TE,DE  and  IE  for  the  total,  direct  and  indirect  effects,  respectively. 
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3  The  Mediation  Formula:  A  Simple  Solution  to  a 
Thorny  Problem 

This  subsection  demonstrates  how  the  Mediation  Formula  of  Eq.  (8)  can  be  applied  in  as¬ 
sessing  mediation  effects  in  nonlinear  models.  We  will  use  the  standard  mediation  model  of 
Fig.  2,  where  all  error  terms  are  assumed  to  be  mutually  independent,  with  the  understand¬ 
ing  that  adjustment  for  appropriate  sets  of  covariates  W  may  be  necessary  to  achieve  this 
independence  (see  footnote  7),  that  Z  may  represent  a  vector  of  variables,  and  that  integrals 
should  replace  summations  when  dealing  with  continuous  variables  (Imai,  Keele,  &  Tingley, 
2010). 

The  Mediation  Formula  (8)  represents  the  average  increase  in  the  outcome  Y  that  the 
transition  from  X  =  a;  to  X  =  x'  is  expected  to  produce  absent  any  direct  effect  of  X  on  Y. 
When  the  outcome  Y  is  binary  (e.g.,  recovery,  or  hiring)  the  ratio  (l  —  IE/TE)  represents  the 
fraction  of  responding  individuals  that  is  owed  to  direct  paths,  while  [l  —  DE/TE)  represents 
the  fraction  owed  to  Z-mediated  paths.  (A  response  is  “owed”  to  a  path  if  it  would  not  have 
occurred  were  it  not  for  the  mechanism  represented  by  that  path.)  These  two  groups  are 
not  necessarily  mutually  exclusive  as  can  be  seen  in  our  enzyme  example;  individuals  who 
respond  only  in  the  presence  of  both  the  enzyme  and  the  drug  should  owe  their  response  to 
both  the  direct  and  indirect  paths.  In  linear  systems,  where  the  two  fractions  correspond  to 
1  —  Pj/t  and  1  —  a/r,  respectively,  they  add  up  to  one  (see  Eq.  (1))  and  the  latter  goes  by 
the  name  “proportion  mediated”  (MacKinnon,  2008,  p.  82). 

3.1  Estimating  mediation  effects: 

The  Mediation  Formula  (8)  tells  us  that  IE  depends  only  on  the  conditional  expectation  of 
K,  not  on  its  distribution.  It  calls  therefore  for  a  two-step  regression  which,  in  principle,  can 
be  performed  nonparametrically.  In  the  first  step  we  estimate  the  conditional  expectation 

g{x,z)  =  E{Y\x,z)  (10) 

for  every  (x,  z)  cell.  In  the  second  step  we  fix  x  and  regard  g{x,  z)  as  a  function  gx{z)  of  Z. 
We  now  estimate  the  conditional  expectation  of  gx{z),  conditional  on  X  =  x'  and  X  =  x, 
respectively,  and  take  the  difference 

IEx,x'iy)  =  Ez\x[gx{,z)\x']  -  Ez\x[gx{,z)\x\.  (11) 

Nonparametric  estimation  is  not  always  practical.  When  Z  consists  of  a  vector  of  several 
mediators,  the  dimensionality  of  the  problem  might  prohibit  the  estimation  of  E{Y\x,z) 
for  every  {x,z)  cell,  and  the  need  may  arise  to  use  parametric  or  semi-parametric  approx¬ 
imations.  We  can  then  choose  an  appropriate  parametric  form  for  E{Y\x,z)  (e.g.,  linear, 
logit,  probit),  estimate  the  parameters  separately  (e.g.,  by  regression  or  maximum  likelihood 
methods),  insert  the  parametric  approximation  into  (8)  and  estimate  its  two  conditional 
expectations  (over  z)  to  get  the  mediated  effect  (VanderWeele,  2009). 

Both  parametric  and  nonparametric  methods  will  be  demonstrated  in  the  next  three 
subsections. 
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3.2  The  linear  case 

Let  us  examine  what  the  Mediation  Formula  yields  when  applied  to  the  linear  version  of  our 
model,  shown  in  Fig.  1(a): 


x  =  ao  +  ei,  z  =  bo  +  Px  +  e2,  y  =  cq  +  ax  +  ■jz  +  63  (12) 

with  61,62,  and  63  uncorrelated,  zero- mean  error  terms  and  ao,bo,  cq  the  regression  intercepts. 
Computing  the  conditional  expectation  in  (8)  gives  E(Y\x,  z)  =  cq  +  ax  +  ■jz,  and  yields 

IE,,,iY)  =  ^(co  +  ax  +  7z)[Piz\x')  -  P{z\x)]  =  7[E{Z\x')  -  E{Z\x)]  (13) 

Z 

=  (x'  —  x){(3'y)  =  (x'  —  x)(r  —  a)  (14) 

where  r  is  the  slope  of  the  total  effect; 

r  =  {E{Y\x')  -  E{Y\x))/{x'  -  x)  =  a  -f  /^y.  (15) 


We  thus  obtained  the  standard  expressions  for  indirect  effects  in  linear  systems,  which  can 
be  estimated  either  as  a  difference  r  —  a  of  two  regression  coefficients  or  as  a  product  /dy  of 
two  regression  coefficients  (see  MacKinnon,  Lockwood,  et  ah,  2007).  These  two  strategies 
do  not  generalize  to  nonlinear  systems  (Pearl,  2011a)  as  will  be  shown  next. 

3.3  Linear  models  with  interaction 

To  understand  the  difficulty,  assume  that  the  correct  model  behind  the  data  contains  a 
product  term  xz  in  the  equation  for  y: 

y  =  Cq  +  ax  +  jz  +  5xz  -L  63, 

a  nonlinear  model  explored  by  many  researchers  (Judd  &  Kenny,  1981;  Jo,  2008;  Kraemer, 
Kiernan,  Essex,  &  Kupfer,  2008;  MacKinnon,  2008).  Further  assume  that  we  correctly 
account  for  this  added  term  and,  through  diligent  analysis  on  a  large  data  set,  we  obtain 
accurate  estimates  of  all  parameters  in  this  model.  It  is  still  not  clear  what  combinations 
of  parameters  measure  the  direct  and  indirect  effects  of  X  on  Y ,  or,  more  specifically,  how 
to  assess  the  fraction  of  the  total  effect  that  is  explained  by  mediation  and  the  fraction  that 
is  owed  to  mediation.®  In  linear  analysis,  the  former  fraction  is  captured  by  the  product 
/5y/r,  the  latter  by  the  difference  (r  —  a)/r  (Eq.  14)  and  the  two  quantities  coincide.  In  the 
presence  of  interaction,  however,  each  fraction  demands  a  separate  analysis. 

To  witness,  substituting  the  nonlinear  equation  in  (4),  (6),  and  (8)  and  assuming  x  =  0 
and  x'  =  1,  yields  the  following  decomposition: 

DEo^i  =  a  bo6,  /Eo,i  =  /3y 

TEo,i  =  a  +  bo6  +  /5(y  -h  J)  =  DEq^i  +  IEq^i  +  /36 

®By  “explain”  we  mean  “sufficient  to  sustain  even  in  the  absence  of  direct  effect.”  By  “owed  to”  we  mean 
“would  not  occur  absent  of  mediation.”  These  interpretations  follow  from  the  counterfactual  definitions 
formulated  in  Section  2,  of  which  Eqs.  (6)  and  (8)  are  derived  statistical  estimands. 
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We  conclude  that  the  portion  of  output  change  for  which  mediation  would  be  sufficient  is 
IEq^i  =  jS'f,  while  the  portion  for  which  mediation  would  be  necessary  is  Ti?o,i  —  DEq^i  = 
j3'^  +  j35.  In  other  words,  the  strength  of  mediation  as  measured  by  the  size  of  the  indirect 
effect,  /dy,  is  not  the  same  as  that  measured  by  subtracting  the  direct  effect,  +  5).  The 
difference,  {35,  is  caused  by  the  interaction  term  5xy  and  is  not  affected  by  a.  We  further 
note  that,  if  two  populations  differ  only  in  the  f3  parameter,  they  will  have  the  same  direct 
effect,  DE,  but  will  differ  in  the  difference  TE  —  IE.  The  former  measures  the  portion 
explained  by  the  direct  effect  and  the  latter  the  portion  owed  to  the  direct  effect.  These 
conclusions  are  not  readily  discernible  from  the  structural  equations  without  the  guidance 
of  the  Mediation  Formula  and,  indeed,  they  have  not  been  addressed  in  previous  analyses  of 
this  interaction  model  (Judd  &  Kenny,  1981;  Jo,  2008;  Kraemer  et  ah,  2008;  MacKinnon, 
2008;  Judd  &  Kenny,  2010). 

We  note  that,  due  to  interaction,  a  direct  effect  can  be  sustained  even  when  the  parameter 
a  vanishes,  and  a  total  effect  can  be  sustained  even  when  both  the  direct  and  indirect  effects 
vanish.  This  illustrates  that  estimating  parameters  in  isolation  tells  us  little  about  the  effect 
of  mediation  and,  more  generally,  mediation  and  moderation  are  intertwined;  each  mediator 
can  act  as  a  moderator  and  each  moderator,  if  affected  by  X,  must  act  as  a  mediator  as  well. 
Although  the  degree  of  moderation  can  be  assessed  separately  from  that  of  mediation,  it  is 
not  necessary  to  base  the  assessment  of  one  on  the  assumption  that  the  other  is  absent,  as 
suggested  by  some  writers  (Baron  &  Kenny,  1986;  Kraemer  et  ah,  2008).® 

If  the  policy  evaluated  aims  to  prevent  the  outcome  Y  by  way  of  weakening  the  mediating 
pathways,  the  target  of  analysis  should  be  the  difference  TE  —  DE,  which  measures  the 
highest  prevention  potential  of  any  such  policy.  This  maximum  will  be  realized  when  the 
mediating  path  is  totally  suppressed,  thus  reducing  the  total  effect  from  TE  to  DE,  hence 
the  difference  TE  —  DE.  If,  on  the  other  hand,  the  policy  aims  to  prevent  the  outcome 
by  weakening  the  direct  pathway,  the  target  of  analysis  should  shift  to  IE,  for  TE  —  IE 
measures  the  highest  preventive  potential  of  this  type  of  policy. 

3.4  The  binary  case 

The  power  of  the  Mediation  Formula  shines  in  studies  involving  categorical  variables.  To 
illustrate,  we  consider  the  case  where  all  variables  are  binary;  generalizations  to  multi¬ 
valued  variables  are  straightforward.  The  low  dimensionality  of  the  binary  case  permits  a 
nonparametric  solution  and  an  explicit  demonstration  of  how  mediation  can  be  estimated 
directly  from  the  data. 

Assume  that  the  observed  data  is  given  by  Table  1.  The  factors  E{Y\x,z)  =  gx,z  and 
E{Z\x)  =  hx  can  be  readily  estimated,  as  shown  in  the  two  right-most  columns  of  Table  1 

®The  degree  of  moderation  exerted  by  any  variable  Z  is  measured  by  the  difference  between  the  controlled 
direct  effects  at  two  levels  of  Z,  CDE{Z  =  zi)  —  CDE{Z  =  zq)  (see  Eq.  (3)).  As  in  mediation,  when 
confounding  is  present,  an  unbiased  estimation  of  moderation  requires  adjustments  for  covariates  that  can 
be  identified  by  graphical  methods  (see  footnote  5). 
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Table  1:  Computing  the  Mediation  Formula  from  empirical  data,  for  the  model  in  Fig.  2, 
with  X,  y,  Z  binary. 

and,  when  substituted  in  (4),  (6),  and  (8),  yield 

DE  =  {gifi  —  (70,0)  (1  —  ho)  +  {gi^i  —  go,i)ho  (16) 

IE  =  {hi  —  ho){go^i  —  gog)  (17) 

TE  =  gi^ihi  +  (7i,o(l  —  hi)  —  [gogho  +  (7o,o(l  —  ho)]  (18) 

We  see  that  logistic  or  probit  regression  is  not  necessary;  simple  arithmetic  operations  suffice 
to  provide  a  general  solution  for  any  data  set,  regardless  of  the  data-generating  process. 

3.5  Numerical  example 

To  anchor  these  formulas  in  a  concrete  example,  let  us  assume  that  X  =  1  stands  for  a 
drug  treatment,  Y  =  1  for  recovery,  and  Z  =  1  for  the  presence  of  a  certain  enzyme  in  a 
patient’s  blood  which  appears  to  be  stimulated  by  the  treatment.  Assume  further  that  the 
data  described  in  Tables  2  and  3  was  obtained  in  a  randomized  clinical  trial  and  that  all 
omitted  factors  (62  and  63  in  Fig.  2)  are  judged  to  be  independent.  Our  research  question  is 
what  role  does  Z  play  in  transmitting  the  action  of  X  on  Y ,  or,  more  specihcally,  to  what 
extent  does  enhanced  secretion  of  enzyme  assist  the  remedial  action  of  the  drug  on  recovery. 

To  further  motivate  the  example,  we  note  that  the  question  above  is  far  from  being 
hypothetical,  but  comes  up  often  in  planning  and  decision  making.  For  example,  suppose 
someone  proposes  the  development  of  a  much  cheaper  drug,  equal  in  all  respects  to  the 
one  under  study,  save  for  lacking  any  effect  on  enzyme  production.  To  determine  the  cost- 
beneht  tradeoffs  of  the  proposed  development,  we  ask  what  reduction  in  efficacy,  TE  —  DE, 
is  expected  from  the  proposed  new  drug. 

Substituting  this  data  into  Eqs.  (16)-(18)  yields: 

DE  =  (0.40  -  0.20)(1  -  0.40)  +  (0.80  -  0.30)0.40  =  0.32 
IE  =  (0.75  -  0.40)(0.30  -  0.20)  =  0.035 

TE  =  0.80  X  0.75  +  0.40  x  0.25  -  (0.30  x  0.40  +  0.20  x  0.10)  =  0.46 
IE/TE  =  0.07  DE/TE  =  0.696  1  -  DE/TE  =  0.304 
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Treatment 

X 

Enzyme  present 
Z 

Percentage  cured 
gx,z  =  E{Y\x,z) 

YES 

YES 

gi,i  =  80% 

Treatment 

Percentage  with 

YES 

NO 

5^1,0  =  40% 

X 

Z  present 

NO 

YES 

5^0,1  =  30% 

NO 

/to  =  40% 

NO 

NO 

5^0,0  =  20% 

YES 

hi  =  75% 

Table  2:  Table  3: 

We  conclude  that  30.4%  of  all  recoveries  is  owed  to  the  capacity  of  the  treatment  to  enhance 
the  secretion  of  the  enzyme/®  while  only  7%  of  recoveries  would  be  sustained  by  enzyme 
enhancement  alone.  The  enzyme  seems  to  act  more  as  a  catalyst  for  the  healing  process  of 
X  than  having  a  healing  action  of  its  own.  The  policy  implication  of  such  a  study  would  be 
that  efforts  to  develop  a  cheaper  drug,  identical  to  the  one  studied,  but  lacking  the  potential 
to  stimulate  enzyme  secretion  would  face  a  reduction  of  30.4%  in  recovery  cases.  More 
decisively,  proposals  to  substitute  the  drug  with  one  that  merely  mimics  its  stimulant  action 
on  Z  but  has  no  direct  effect  on  Y  are  bound  for  failure;  the  drug  evidently  has  a  benehcial 
effect  on  recovery  that  is  independent  of,  though  enhanced  by  enzyme  stimulation. 

It  is  instructive  at  this  point  to  unfold  the  intuition  behind  the  IE  formula  (Eqs.  (8)  or 
(17))  as  reflected  in  our  numerical  example.  Our  task  is  to  evaluate  the  fraction  of  recoveries 
that  would  be  sustained  solely  by  the  drug  enhancement  of  enzyme  secretion  absent  any  other 
effect  the  drug  may  have  on  the  outcome.  Table  2  reveals  that,  under  no  drug  condition, 
a  subject  carrying  the  enzyme  has  a  (0.30  —  0.20)  greater  chance  of  recovering  than  one 
without  the  enzyme.  Table  3  shows  that  the  drug  increases  the  proportion  of  the  former 
subjects  by  (0.75  —  0.40).  Therefore,  multiplying  the  increase  in  enzyme  counts  (0.75  —  0.40) 
by  the  increase  in  cure  rate  (0.30  —  0.20)  gives  IE  =  0.035,  which  is  about  7%  of  the  total 
effect  (TE). 


4  Relations  to  Traditional  Approaches 

Conventional  methods  do  not  dehne  direct  and  indirect  effects  in  nonlinear  settings  where 
the  underlying  process  is  unknown,  nor  do  they  agree  on  a  principle  for  dehning  those  effects 
when  the  process  is  known.  (MacKinnon,  2008,  Ch.  11),  for  example,  analyzes  categorical 
data  using  logistic  and  probit  regressions  and  constructs  effect  measures  using  products  and 
differences  of  the  parameters  in  those  models.  These  measures  are  not  compatible  with 
the  causal  interpretation  of  effect  measures,  even  when  the  parameters  are  precisely  known; 
IE  and  DE  may  be  extremely  complicated  functions  of  those  regression  coefficients  (Pearl, 
2011a).  Fortunately,  those  coefficients  need  not  be  estimated  at  all;  mediation  measures 
can  be  estimated  directly  from  the  data  (16-18),  circumventing  the  parametric  analysis 
altogether. 

Attempts  to  extend  the  difference  and  product  heuristics  to  nonparametric  analysis  have 

^*^These  percentages  refer  to  population  level  proportions,  not  to  individuals.  It  is  quite  possible  that  more 
than  30.4%  of  those  recovered  will  remain  ill  without  enhanced  enzyme  secretion,  if  a  balancing  group  of 
uncured  patients  would  actually  gain  recovery  as  a  result  of  no  enhancement. 
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encountered  ambiguities  that  the  Mediation  Formula  can  now  resolve. 

The  product-of-coefficients  heuristic  advises  us  to  multiply  the  unit  effect  of  X  on  Z 

Cf3  =  E{Z\X  =  1)  -  E{Z\X  =  0)  =  hi-ho 

by  the  unit  effect  of  Z  on  Y  given  X, 

=  E{Y\X  =  x,Z  =  l)-  E{Y\X  =  x,Z  =  f))= 

but  does  not  specify  on  what  value  we  should  condition  X.  Equation  (17)  resolves  this 
ambiguity:  should  be  conditioned  on  X  =  0  in  order  for  the  product  CgC^  to  yield  the 

correct  mediation  measure,  IE. 

The  difference-in-coefficients  heuristics  instructs  us  to  estimate  the  direct  effect  coefficient 

a  =  E{Y\X  =  CZ  =  z)  -  E{Y\X  =  f),Z  =  z)  =  gi,,  -  go,, 

and  subtract  it  from  the  total  effect,  but  does  not  specify  on  what  value  we  should  condition 
Z.  Equation  (16)  determines  that  we  should  condition  on  both  Z  =  0  and  Z  =  1  and  take 
their  weighted  average,  using  ho  =  P{Z  =  1|X  =  0)  as  the  weighting  function. 

To  summarize,  the  Mediation  Formula  dictates  that,  in  calculating  IE,  we  should  con¬ 
dition  on  both  Z  =  1  and  Z  =  0  and  average  while,  in  calculating  DE,  we  should  condition 
on  only  one  value,  X  =  0,  and  no  average  need  be  taken. 

The  difference  and  product  heuristics  are  both  legitimate,  with  each  seeking  a  different  ef¬ 
fect  measure.  The  difference-in-coefficients  heuristics,  leading  to  TE  —  DE,  seeks  to  measure 
the  fraction  of  the  response  for  which  mediation  was  necessary.  The  product-of-coefficients 
heuristics,  leading  to  IE,  seeks  to  estimate  the  fraction  of  response  for  which  mediation 
would  be  sujficient.  The  former  informs  policies  aimed  at  suppressing  mediating  pathways; 
the  latter  informs  those  aimed  at  suppressing  direct  pathways. 

In  addition  to  providing  causally  sound  estimates  for  mediation  effects,  the  Mediation 
Formula  also  enables  researchers  to  evaluate  analytically  the  effectiveness  of  various  para¬ 
metric  specihcations  relative  to  any  assumed  model  (Pearl,  2011a;  Imai,  Keele,  &  Tingley, 
2010).  This  type  of  analytical  “sensitivity  analysis”  could  not  be  applied  to  mediation  anal¬ 
ysis,  owing  to  the  absence  of  an  objective  target  quantity  that  captures  the  notion  of  indirect 
effect  in  nonlinear  systems,  free  of  parametric  assumptions.  The  Mediation  Formula  of  Eq. 
(8)  explicates  this  target  quantity. 

The  power  of  the  Mediation  Formula  was  recognized  by  Petersen  et  ah  (2006);  Glynn 
(2009);  VanderWeele  and  Vansteelandt  (2009);  Hafeman  and  Schwartz  (2009);  Mortensen, 
Diderichsen,  Smith,  and  Andersen  (2009);  VanderWeele  (2009);  Kaufman  (2010);  Imai, 
Keele,  and  Tingley  (2010).  Imai,  Keele,  and  Yamamoto  (2010)  have  further  shown  that 
nonparametric  identihcation  of  mediation  effects  allows  for  a  flexible  estimation  strategy 
and  illustrate  this  with  various  nonlinear  models,  quantile  regressions,  and  generalized  ad¬ 
ditive  models.  Imai,  Keele,  Tingley,  and  Yamamoto  (2010)  describe  an  implementation  of 
these  extensions  using  a  convenient  R  package.  Sjolander  (2009)  provides  bound  on  DE  in 
cases  where  the  confounders  between  Z  and  Y  cannot  be  controlled. 

The  ability  of  the  Mediation  Formula  to  carry  us  across  the  linear-nonlinear  barrier 
may  suggest  that,  similar  to  traditional  path  analysis  in  linear  systems,  we  can  now  assess 
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(nonparametrically)  the  mediating  effect  of  any  chosen  path  or  a  bundle  of  paths  in  a  causal 
diagram  (Alwin  &  Hauser,  1975;  Bollen,  1989).  This  turned  out  not  to  be  the  case.  Avin 
et  ah  (2005)  showed  that  there  are  many  bundles  of  paths  (i.e.,  subgraphs)  in  a  graph  G 
whose  mediation  effects  cannot  be  assessed  from  either  observational  or  experimental  studies, 
even  in  the  absence  of  unobserved  confounders.  They  proved  that  the  mediation  effect  of  a 
subgraph  SG  is  estimable  if  and  only  if  it  contains  no  “broken  fork,”  that  is,  a  path  pi  from 
X  to  some  vertex  W,  and  two  paths,  p2  and  ps,  from  W  to  Y,  such  that  pi  and  p2  are  in 
SG  and  ps  is  in  G  but  not  in  SG. 

Clearly,  a  broken  fork  condition  cannot  occur  in  the  graph  of  Fig.  1(b),  which  enables  us 
to  assess  the  mediation  effect  of  any  subset  of  {Zi,  Z2,  Z^}.  However,  if  we  add  the  arrow 
Z3  — Z2,  then  the  effect  contributed  by  the  path  X  ^  Z^  ^  Z2  ^  Y  would  not  be 
estimable,  because  removing  the  path  p^  :  Z3  ^  Y  from  the  evaluated  subgraph,  creates 
a  “broken  fork.”  The  controlled  direct  effect,  in  contrast,  is  always  estimable  when  there 
are  no  unmeasured  confounders  and,  for  any  graph  and  any  subset  Z,  it  is  given  by  the 
truncated-factorization  formula  (Pearl,  2009,  p.  72). 


Conclusions 

Traditional  methods  of  mediation  analysis  have  been  limited  to  linear  models  or  semi-linear 
regression  models,  and  have  produced  distorted  estimates  of  “mediation  effects”  when  applied 
to  nonlinear  models,  or  models  with  categorical  variables.  This  paper  offers  a  causally  sound 
alternative  that  asymptotically  ensures  bias-free  estimates  while  making  no  assumption  on 
the  distributional  form  of  the  underlying  process. 

We  distinguished  between  proportion  of  response  cases  for  which  mediation  was  necessary 
and  those  for  which  mediation  would  have  been  sufficient.  Both  measures  play  a  role  in  me¬ 
diation  analysis,  and  are  given  here  a  formal  representation  through  the  Mediation  Formula. 
This  formula  is  estimable  by  ordinary  regression  and  provides  an  objective  measure  of  the 
extent  to  which  an  effect  is  mediated  through  a  given  mediating  path,  independent  of  the 
method  chosen  for  estimating  that  effect.  While  the  validity  of  the  formulas  rests  on  the 
same  assumptions  that  are  required  for  standard  linear  analysis  (i.e.,  no  unmeasured  con¬ 
founders),  their  general  appeal  to  nonlinear  systems,  continuous  and  categorical  variables, 
and  arbitrary  complex  interactions  render  them  a  powerful  tool  for  the  assessment  of  causal 
pathways  in  many  of  the  health  related  sciences. 
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